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Abstract 

Given a vertex-weighted tree T, the split of an edge xy in T is min{s x (xy) , s y (xy)} where s u {uv) 
is the sum of all weights of vertices that are closer to u than to v in T. Given a set of weighted 
vertices V and a multiset of splits 5, we consider the problem of constructing a tree on V whose 
splits correspond to 5. The problem is known to be NP-complete, even when all vertices have unit 
weight and the maximum vertex degree of T is required to be no more than 4. We show that 

• the problem is strongly NP-complete when T is required to be a path, 

• the problem is NP-complete when all vertices have unit weight and the maximum degree of T 

is required to be no more than 3, and 
-*— • 

• it remains NP-complete when all vertices have unit weight and T is required to be a caterpillar 

^^ with unbounded hair length and maximum degree at most 3. 

qq We also design polynomial time algorithms for 

• the variant where T is required to be a path and the number of distinct vertex weights is 
constant, and 

• the variant where all vertices have unit weight and T has a constant number of leaves. 

[— -J The latter algorithm is not only polynomial when the number of leaves, k, is a constant, but also 

ryj fixed-parameter tractable when parameterized by k. 

O Finally, we shortly discuss the problem when the vertex weights are not given but can be freely 

chosen by an algorithm. 

The considered problem is related to building libraries of chemical compounds used for drug design 
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and discovery. In these inverse problems, the goal is to generate chemical compounds having desired 
structural properties, as there is a strong correlation between structural properties, such as the 
Wiener index, which is closely connected to the considered problem, and biological activity. 



1 Introduction 

O 



In this paper, we consider trees T — (V, E) where integer weights are associated to vertices by a function 
u> : V — > N, where N denotes the set of natural numbers excluding 0. 

Definition 1. Let T be a tree and u> : V — > N be a function. The split of an edge e in T is the 
minimum of O(Ti) and VLiT^), where T\ and Ti are the two trees obtained by deleting e from T, and 



d Q{Ti) — X^-ueT- u i v )- We use S(T) to denote the multiset of splits ofT. 

We consider the problem of reconstructing a tree with a given multiset of splits and a given set of weighted 
vertices. 
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Weighted Splits Reconstruction (WSR): Given a set V of n vertices, a weight function 
u) : V — > N, and a multiset S of integers, is there a tree T whose multiset of splits is S (i.e. 
<S(T)=«S)? 

The Weighted Splits Reconstruction for Trees of Maximum Degree k problem (WSRt) is 
defined in the same way, except that we restrict the tree T to have maximum degree at most k. When 
we require T to belong to a class of trees T, the problem is called Weighted Splits Reconstruction 
for T. 

When u assigns unit weights to the vertices, the problem is simply called Splits Reconstruction 
(SR). The Splits Reconstruction for Trees of Maximum Degree k problem (SRfc) and the 
Splits Reconstruction for T are the obvious unweighted counterparts of the weighted variants 
defined above. 

Related Work. In the field of Chemical Graph Theory [3J [T!5], molecules are modeled by graphs 
in order to study the physical properties of chemical compounds. A chemical graph is a graph, where 
vertices represent atoms of a chemical compound and edges the chemical bonds between them. Within 
the area of quantitative structure-activity relationship (QSAR), several structural measures of chemical 
graphs were identified that quantitatively correlate with a well defined process, such as biological activity 
or chemical reactivity. Probably the most widely known example is the Wiener index (see [13]): the 
sum of the distances in a graph between each pair of vertices, where the distance between two vertices 
is the length (the number of edges) of a shortest path from one to the other. Wiener [20 found a strong 
correlation between the boiling points of paraffins and the Wiener index. From then on, many other 
topological (using the information of the chemical graph) and topographical (using the information of 
the chemical graph and the location of its vertices in space) indices were introduced and their correlation 
with various other biological activities was investigated. 

In Combinatorial Chemistry, drug design is facilitated by building libraries of molecules that are 
structurally related (via the Wiener index or any of the other numerous indices). We face inverse 
problems where the goal is to design new compounds that have a prescribed structural information (see 
also [5]). 

Goldman et al. [12] study problems related to the design of combinatorial libraries for drug design 
from an algorithmic and complexity-theoretic point of view, following the heuristic approaches of |18j 
and [IT] . Goldman et al. show that for every positive integer W, except 2 and 5, there exists a graph 
with Wiener index W. They also show that every integer, except a finite set, is the Wiener index of 
some tree. For constructing a tree (of unbounded or bounded maximum degree) with a given Wiener 
index, they devise pseudo-polynomial dynamic programming algorithms. Goldman et al. also introduce 
the Splits Reconstruction problem and recall a result due to Wiener [20] : the Wiener index of a 
tree T on n vertices with unit weights is ^2 se g(T) s ■ (n — s). They show that SR is NP-complete and 
give an exponential-time algorithm without running time analysis. 

As it is not reasonable to construct chemical trees with arbitrarily high vertex degrees, Li and Zhang 
[To] studied SR4 and showed that it is also NP-complete. Their algorithm to construct a tree with 
maximum degree at most 4 to solve SR4 runs in exponential time (no running time analysis is provided) 
and creates weighted vertices in intermediate steps. 

In order to reconstruct glycans or carbohydrate sugar chains, Aoki-Kinoshita et al. [1] study the re- 
construction of a node-labeled supertree from a set of node-labeled subtrees. They give a 6-approximation 
algorithm for this problem, which generalizes the smallest superstring problem. 

We refer to [1] surveying results on the Wiener index for trees. 

Our Results. By the result of Li and Zhang [IB], SR4 is NP-complete, while SR2 is trivially in P. 
We close this gap by showing that SR3 is NP-complete by a reduction from Numerical Matching 
with Target Sums (defined below). It is even NP-complete for caterpillars with unbounded hair 
length. Identifying small classes of trees for which the problem is NP-complete may be important for 
future investigations in the spirit of the deconstruction of hardness proofs [15] which aim at identifying 
parameters for which the problem becomes tractable when these parameters are small. 

Our main result proves that WSR2 is strongly NP-complete by a reduction from a variant of Nu- 
merical Matching with Target Sums in which all integers of the input are distinct. For the case 
where the weights of the vertices are chosen from a small set of values, our dynamic-programming algo- 
rithm solves WSR2 in time 0(n k+3 ■ k), where k is the number of distinct vertex weights. Although this 
running time is polynomial for every constant k, the degree of the polynomial depends on k. Thus, the 



running time becomes impractical, even for small values of k. Parameterized complexity [51 [5J [T7] is a 
theoretical framework that allows to distinguish between running times of the form f{k)n 9 ^ where the 
degree of the polynomial depends on the parameter k and running times of the form f{k)n°^ 1 ' where the 
exponential explosion of the running time is restricted to the parameter only. The fundamental hierarchy 
of parameterized complexity classes is 

FPT C W[l] C W[2] • ■ ■ C XP, 

where a parameterized problem is in FPT (fixed-parameter tractable) if there is a function / such that 
the problem can be solved in time /(fc)^ ' 1 ), a problem is in XP if there are functions /, g such that 
the problem can be solved in time f{k)n 9 ^\ and W[t], t > 1, are parameterized intractibility classes 
giving strong evidence that a parameterized problem that is hard for any of these classes is not in FPT. 
Our algorithm for WSR2 parameterized by the number of distinct vertex weights places this problem 
in XP. A generalization of this problem is W[l]-hard [7J, but it remains open whether this problem 
is fixed parameter tractable. As a relevant parameter for SR we identified k, the number of leaves in 
the reconstructed tree. This parameterization of SR can be solved in time 0(8 fclogfe • n), and is thus 
fixed-parameter tractable. 

Definitions. A caterpillar is a tree consisting of a path, called its backbone, and paths attached with 
one end to the backbone. Its hair length is the maximum distance (in terms of the number of edges) 
from a leaf to the closest vertex of the backbone. A star Kij. is a tree with k leaves and one internal 
vertex, called the center. In our hardness proofs, we reduce from the following problem (problem [SP17] 
in 0). 

Numerical Matching with Target Sums (NMTS): Given three disjoint multisets A,B, 
and S = {si, . . . , S m }, each containing m elements from N, can A U B be partitioned into m 
disjoint sets C%, C2, ■ ■ ■ , C m , each containing exactly one clement from each of A and B, such 
that, for 1 < i < m, X)eec c = s '^ 

Organization. The remainder of this paper is organized as follows. Section [2] shows that WSR2 is NP- 
complete. On the positive side, we show in Section [3] that WSR2 can be solved in polynomial time when 
the number of distinct vertex weights is bounded by a constant. That this result cannot be extended 
to WSR3 is shown in Section [4J SR3 remains NP-complete. Section [5] gives an FPT-algorithm for SR 
parameterized by the number of leaves of the reconstructed tree. The variant where the vertex weights 
are freely choosable is discussed in Section [6] and we conclude with some directions for future research in 
Section 

2 WSR2 is strongly NP-complete 

In this section, we show that WSR2 is strongly NP-complete. First we introduce a new problem that is 
polynomial-time-reducible to WSR2, and then show that this new problem is strongly NP-hard. 

Scheduling with Common Deadlines (SCD): Given n jobs with positive integer lengths 
Jij • ■ • ,jn and n deadlines d\ < . . . < d n , can the jobs be scheduled on two processors P\ and 
P2 such that at each deadline there is a processor that finishes a job exactly at this time, and 
processors are never idle between the execution of two jobs? 

To reinforce the intuition on this problem one may imagine that we want to satisfy delivery deadlines 
and avoid using any warehouse space to store a product between its fabrication and the delivery date. 
There is no restriction as to which product should be delivered at a given time. (Another possibility is 
imagining computer scientists scheduling paper production to fit conference deadlines.) 

Given an instance (ji, . . . ,j n , di, . . . , d n ) for SCD, we construct an instance for WSR2 as follows. 
For each job ji, 1 < i < n, create a vertex Vi with weight u)(yi) = ji. For each deadline d,, 1 < i < n— 1, 
create a split dj. We may assume that Y27=i 3* = d n -i + d n , otherwise we trivially face a No-instance. 

Suppose the path P — (ty(i), ^(2): • • ■ i v n{n)) is a solution to WSR2. Say {iVmjtV^+i)} is the edge 
associated to the split d n -i- We construct a solution for SCD by assigning the jobs Jtt(i),Jtt(2), ■ ■ ■ >jn(£) 
to processor Pi, and the jobs JTv( n )iJw(n-i)) ■ ■ ■ iJ7rU+2)tJnU+i) to processor P2, in this order. Note that 



then, one of the jobs j^m, jn(t+i) ends at d n —i, and the other at — d n —i + ^2 i= iJi = d n , which is as 
desired. 

On the other hand, if SCD has a solution, then WSR2 has a solution as well, because the previous 
construction is easily inverted. Visually, the list of jobs of P 2 is reversed and appended to the list of jobs 
of Pi. Job lengths correspond to vertex weights and deadlines correspond to splits (the last deadline 
where a job from Pi finishes is merged with the last deadline where a job from P2 finishes). Thus, SCD 
is polynomial-time- reducible to WSR2. 

Lemma 2. SCD < p WSR 2 . 

In the remainder of this section, we show that dNMTS is polynomial-time-reducible to SCD. The dNMTS 
problem is equal to the NMTS problem, except that all integers in A U B U S are pairwise distinct. This 
variant has been shown to be strongly NP-hard by Hulett et al. [H]. As the proof becomes somewhat 
simpler, we use dNMTS instead of NMTS for our reduction. 

Let us first give a high level description of the main ideas of the reduction. For a dNMTS instance 
(A, B, S), the elements of AuB will be encoded as jobs, and the elements of S will be encoded as deadlines. 
A convenient way to represent an element s € S is by introducing segments which are delimited to the 
left and the right by double deadlines, and whose distance is equivalent to s. The elements of A U B U S 
are blown up by well-chosen additive factors that preserve solutions and make sure that the length of 
each segment can only be met by the sum of exactly two job-lengths, one corresponding to an element 
of A and the other to an element of B. 

Our reduction will create an instance whose solution assigns, in each segment, one x-job (a job 
corresponding to an A-element) and one y-job (a job corresponding to a _B-element) to the same processor, 
such that these two jobs are the only jobs executed on this processor in this segment, thus providing a 
solution to dNMTS. Without loss of generality, the x-job is scheduled first. As we must not introduce 
any restriction which x-jobs can be assigned to which segments, we introduce a deadline for each length 
of an x-job; these are the real deadlines. We refer to the x- and y-jobs as green jobs. The job lengths 
were blown up such that in each segment, exactly one processor starts with a green x-job, and in each 
segment, exactly one processor ends by executing a green y-job. In each segment, the green jobs must not 
overlap; this is achieved by multiplying all deadlines created so far and the corresponding job lengths by 
a factor 2, and introducing fake deadlines at odd positions one unit before the real deadlines. If an x-job 
and a y-job overlapped, there would be no job ending at the fake deadline preceding the real deadline at 
which the x-job ends, as all green jobs have even length and all real deadlines and double deadlines are 
even. Blue, red, and black jobs are created to meet all deadlines on the processor that is not currently 
executing green jobs. The blow-up of the elements of A U B U S ensures that these jobs cannot equate 
the green jobs (except the black jobs whose lengths might equal the lengths of green y-jobs, but, without 
loss of generality, one can assign them to the last part of each segment of the processor not executing 
a green job). That none of these jobs is executed between two green jobs within a segment is ensured 
as the sum of all green job lengths equals the sum of the lengths of the segments. This summarizes the 
reduction and gives the reasons for the different elements of the construction. Let us now turn to the 
formal reduction. 

Let (A, B, S) be an instance for dNMTS. We suppose, without loss of generality, that YnLi s « — 



Z-^x 



eAuB ■ 



otherwise (A, B,S) is trivially a No-instance for dNMTS. Let A = {oi, 



a m } and B = 
< Si+i, for all 

. . . , x n J, 1 . 



{61, ... , b m }. We also assume, without loss of generality, that aj < o»+i, b 
i G {1, . . . , m — 1}, that a m < b m , and that s m < a m + b m . 

First, we construct an equivalent instance (X,Y,Z) for dNMTS. Each of X :— {xi, 
{yi, . . . , y„}, and Z := {zi, . . . , z n } has n := m + 1 elements: 

for i <G {1, . . . ,n - 1}, 
x, L :=2-(ai + (b m +2)), 

y 4 :=2-(6 l + 3-(6 m + 2)), 

z t :=2- (s 4 +4- (b m + 2)), and 

The elements of X, Y, and Z have the following properties. 

Property 1. Each element of X U Y U Z is an even positive integer. 

Property 2. For every i £ {1, . . . , n — 1}, we have that Xj < Xi+i, that y^ < yi+i, and that Zi < z i+ i. 
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Figure 1: How jobs are assigned to processors in the SCD instance in segment j < n. 



Property 3. For every i E {1, . . . , n}, we have 

2 • b m + 4 < Xi < 4 • b m + 4, 
6 • b m + 12 < j/, < 8 • b m + 14, and 

8 • 6 m + 16 < Zi < 12 • b m + 18. 



In particular, Property |3] implies that y\ > x n , that z\ > y n , and that 2 ■ yi > z n . Properties [T}[3] easily 
follow by construction of X, Y, and Z. 

Property 4. If k and £ are integers such that Xk + ye = z n , then k = £ = n. 

Property [4] holds because x n and y n are the only elements of X and Y, respectively, that are large enough 
to sum to z n . 

Property 5. Let p, q E X U Y, p < q. and z E Z. If p + q = z. then p E X and q E Y. 

By Property [3j the sum of any two X-elements is smaller and the sum of any two Y-elemcnts is larger 
than any element of Z. 

For our SCD instance, we create the following deadlines: 

• real deadlines: rij := Xi + X)i=i z fc> f° r eacn J €= {0, • • • ,n — 1} and each i E {1, . . . , n}, 



/afce deadlines: /j ,• := 



«,j 



1, for each j £ {0, . . . ,n — 1} and each i E {1, . . . , n}, and 



sum deadlines: two deadlines dsij :— ds2,j '■— X)l-=i z fc, for each j E {1, 



'}• 



The sum deadlines we just defined partition the interval [0, dsi. n ] into n segments lj := [dsij—ijdsij], 
j = 1, . . .n, where for convenience, we let eisi.o = 0. We create jobs with the following lengths, where 
x = : 

• green x-jobs: Xi, for each i E {1, . . . , n}, 

• green y-jobs: yi, for each i E {1, . . . , n}, 

• blue jobs: n ■ (n — 1) times a job of length 1, 

• red fill jobs: n — 1 times a job of length x, — 1 — ar»_i, for each i E {1, . . . , n\, 

• red overlap jobs: Xi — Xi-i, for each i E {1, . . . , n}, 

• black fill jobs: zi — x n for ? E {1, . . . ,n — 1}, and 



• a black overlap job: z. 



n ^n 



To illustrate these definitions, we start by showing that if we have a YES-instance (X, Y, Z) for dN- 
MTS, then we have an SCD YES-instance as well. Let C\,C2, ■ ■ ■ ,C n be n couples such that Cj = 
{x wl (j),y W2 u)} and X wi u) + y^ 2 (j) = Zj, j € {l,...,n}, for two permutations tt± and 7r 2 of the set 
{1, . . . ,n}. We construct a solution for SCD. Let us construct the schedules for Pi and Pi- For each 
i6{l,...,n-l}, 

• assign the green x-job x^ x u\ to the interval [dsi > j-i,r 7ri (j^j_i] of Pi, 

• assign the green y-job y„ a (j) to the interval [r ni tj) t j-i, dsij] of P\, 

• assign a red fill job of length xi — 1 to the interval [dsij-i, fi,j-i] of P2, 

• for every i G {1, . . . ,n — 1} \ ni(j), assign a red fill job of length Xi+i — 1 — Xi to the interval 
[ri,j-i,fi+i,j-i] of P 2 , 

• for every iG {1, . . . , n} \ 7Ti(j), assign a blue job to the interval [fij-iifij-i] of P 2 , 

• assign a red overlap job of length x 7ri (j) +1 — x vi u\ to the interval [/iri(j) l j-i>/iri(3)+i 1 j-i] °f -R*, 
and 

• assign a black fill job of length Zj — x n to the interval [r n j-i,dsij] of P 2 . 

It only remains to assign jobs to the last segment. The last segment of Pi contains the green a; -job x n 
and the green y-job y n , in this order. The last segment of P 2 contains a red fill job of length xi — 1, a 
blue job, a red fill job of length X2 — 1 — Xi, a blue job, . . ., a red fill job of length x n — 1 — £ n _i, and 
the black overlap job, in this order. See Fig. [I] for an illustration. 

Now suppose the SCD instance is a YES-instance. We will show some structural properties of any 
valid assignment of jobs to the processors, which will help to extract a solution for our original dNMTS 
instance. We will show that in each segment Ij, any valid solution for the SCD instance has exactly one 
green x-job Xk and exactly one green y-job ye, and that Xk and yn sum to Zj. 

Consider a valid assignment of the jobs to the processors Pi and P 2 . As two jobs with the same 
length are interchangeable, when we encounter a job whose length belongs to more than one category 
(for example "black fill" and "green y" ) we may choose in this case, without loss of generality, to which 
category the job belongs. 

Claim 1. A black fill job is assigned to each interval [r n ,j, dsij+i] with j G {0, . . . , n — 2}. 

Proof. Let j G {0, . . . , n — 2}. Two jobs must finish at the double deadline dsi j+i, dS2,j+i- One of these 

must start at r n j and thus has length dsij+i — r n j = Y2k=i Zk ~ Xn ~ Si=i Zk = z i+i ~ •"»• ^° ^^ s 
job is, without loss of generality, a black fill job. □ 

This uses up all black fill jobs. 

Claim 2. The green y-job y n is assigned to the interval [r nn _i, dsi_ n ]. 

Proof. As in the previous proof, one job must be assigned to this interval, whose length is X)fe=i z k ~ 
x n — X}fe=i z k = z n — x n , which is y n by Property 4 Thus, the green y-job y n is assigned to the interval 
[r n>n -i,dai tV ,]. D 

Claim 3. The black overlap job is assigned to the interval [fn,n—i,dsi tn ]. 

Proof. As r n! „_i is the only deadline between f n ,n-\ and dsi. n , the processor that does not use this 
deadline needs to process a job finishing at dsi >n and starting before r nn _i. This is the black overlap 
job, since no other job is long enough. It is assigned to the interval [f n . n -i, dsi_ n ] of length dsi rn — f n ,n—i = 

z n Xji ~t~ X. I — I 

This uses up all black jobs. Now, the only jobs left whose length is between 6b m + 12 and 86 m + 14 are 
the green y-jobs yi, ... , y n -i- 

Claim 4. For each I € {1, . . . , n— 1}, the green y-job yi is assigned to an interval [r$ j_i, dsij] for some 
i G {1, . . . ,n— 1} and j £ {1, . . . ,n — 1}. 



Proof. Each job is assigned to an interval inside some segment, as the double deadlines prevent jobs to 
span more than one segment. Suppose the green y-job ye is assigned to segment p. As dsi tP + ye > 
dsi :P + x n , by Properties [2] and [31 and the deadline following r„ iP = dsi, p + x n is dsi iP +i, it must be that 
the green y-job ye finishes at ds\ p +i- Moreover, ds lp+1 — ye is equal to a real deadline as ds\ tP +\ — ye 
is even. □ 

Each of the In jobs that have been assigned so far finish at a double deadline dsij, ds% j. Thus, no other 
jobs may end at a double deadline. 

Claim 5. A red fill job of length X\ — 1 is assigned to each interval [dsij, fi,j] with < j < n — 1. 

Proof. Since both processors finish a job at deadline ds\ j (respectively, are initialized at time ds\$ = 0) 
and one of them finishes a job at the following deadline, which is fij, we need to assign a job of length 
fi.j — dsij = Xi — 1 to the interval [ds-yj, fi,j]- Without loss of generality, this is one of the red fill jobs 
of length x\ — 1. D 

This uses up all red fill jobs of length x\ — \. 

Claim 6. For each £ <E {l,...,n}, the green x-job xe is assigned to an interval [dsi t j,rij] for some 
i G {1, . . . , n} and j £ {0, . .. ,n — 1}. 

Proof. Suppose the green x-job xg is assigned to segment p. Notice that xg > r np — fi p . Indeed 
r n, P — fi, P — x n — x\ + 1 and, by construction, x n — X\ + 1 < 26 m , whereas xg > 2b m + 4. Moreover, r np 
is the latest deadline in p. So the green x-job xt starts at dsi iP . Notice that dsi iP + xt < dsi iP+ i and 
that dsi tP + xe corresponds to a real deadline as dsi tP + xe is even, but all fake deadlines are odd. □ 

By Claims [2l [4l and [61 and since we have the same amount of segments as green x-jobs, respectively 
green y-jobs, we obtain that each segment Ij, 1 < j < n„ contains exactly one green x-job and exactly 
one green y-job. 

Claim 7. For j G {1, . . . , n}, the green x-job and the green y-job in the segment Ij do not overlap. 

Proof. Suppose otherwise, that is, suppose there is a j g {1, . . . , n} such that Ij contains a green x-job, 
say xe, and a green y-job, say y^, that overlap (i.e. the intervals they are assigned to overlap). Since Xe 
ends at a real deadline by Claim [6] and yu starts at a real deadline by Claim |4j no job ends at the fake 
deadline situated at ds\j-i + xp — 1, which contradicts the validity of the SCD solution. □ 

The last claim implies that in each segment Ij, 1 < j < n, there is a green x-job xe- and a green y-job 
yk which together have the same size as the interval. Hence the couples Cj = {ae , , 6fe,}, 1 < j < n, form 
the desired solution of dNMTS. Thus, we have the following lemma. 

Lemma 3. dNMTS < p SCD. 

We have assembled enough information to prove our main theorem. 

Theorem 4. WSR2 is strongly NP-complete. 

Proof. The theorem follows from the strong NP-hardness of dNMTS, Lemmas [5] and [3j and the member- 
ship of WSR.2 in NP, which is easily verified as the certificate is a path and an assignment of the splits 
to its edges, all of which can be encoded in polynomial space. □ 

Corollary 5. Splits Reconstruction for Caterpillars of Unbounded Hair-Length and 
Maximum Degree 3 is NP-complete. 

Proof. It is clear that this problem, abbreviated SRC, is in NP. To show that it is hard for NP, we reduce 
from WSR2. Let I' P — (u)[, . . . , ui' n _ 2 , s[, . . . , s' n _ 3 ) be an instance of WSR2, where u)[, 1 < i < n — 2, 
are the vertex weights and s'j, 1 < j < n— 3, are the splits. We assume that all vertex weights and splits 
are upper bounded by a polynomial in n; as WSR2 is strongly NP-hard, it is still NP-hard with this 
restriction. Define Q := 1 + 2 • max{wj : 1 < i < n — 2}. To simplify the argument, consider an auxiliary 
instance Ip = (u>i, . . . ,ui n , si, . . . , s„_i) of WSR2 obtained from I' p by: 

• augmenting the values of s' , 1 < j < n — 3, by f2, 



• adding w„_i = uj n = il to the multiset of weights, 

• adding s„_2 = s n -\ = il to the multiset of splits, 

• and finally, multiplying each value in Ip by nil (so, for 1 < % < n— 3, cj^ = w^nO, and Si = (s^+^il). 

It is not difficult to see that Ip and I p are equivalent. 

Now let us create an instance Ic of SRC in the following way. 

• replace each weight Wj, 1 < i < n, by u>% copies of weight 1, 

• for each u>i, 1 < i < n, add auxiliary splits s/,, = /, 1 < / < u>j — 1, 

• keep the original splits (si, . . . , S„_i). 

Notice that there are £" =1 Wj vertices and (X)"=i w j) — 1 splits (i.e. edges) in total. 

As one easily checks, if Ip has a solution then Iq has a solution. Now suppose Ic has a solution 
C. Then, as C is an instance for SRC, it follows that C is a caterpillar of maximum degree 3 (with 
unbounded hair- length) . Call B the backbone of C. Let B' C B be maximal such that its endvertices 
have degree 3. 

By construction si > 1, 1 < i < n — 1, and only the splits si^, 1 < i < n, have value 1. There are 
exactly n such splits, and so, C must have exactly n leaves. 

Since there is no split of value nil 2 + 1, each hair of C has length at most nil 2 . So, as s» > nil 2 for 
i = 1, . . . n — 3, we obtain that the splits si, ..., s n _3 are assigned to edges b\, . . . , 6„_3 in E(B'). Observe 
that the edges &i,... ,6 n -3 induce a connected graph (i.e. a path), as all other splits are smaller than 
the minimum of the Sj, i = 1, ... n — 3. 

Let P be the path formed by the edges b\, . . . , 6„_3 and let u and v be the two endpoints of P. There 
is at most one vertex y such that if we look at the values of the splits of the edges from u to y (resp. 
from v to y), then they are strictly increasing. In addition, if two edges of P share a vertex x, x ^= y, 
then there must be a hair attached to x, because the splits associated to these two edges differ by more 
than 1. Furthermore, there are hairs Hi and H2 of length nil 2 attached to the first and to the last vertex 
on the backbone, as no two auxiliary splits are large enough to add up to one of the original splits Sj, 
% = 1, ... n — 3. From the fact that C has exactly n leaves, it follows that the remaining hair has to be 
attached to y. As a consequence, E(B') = {61, . . . , 6„_3}. 

Let B" be equal to B' augmented with the two edges to which the splits of value nil 2 are assigned. 
All edges outside B" (that is, edges from hairs) belong to auxiliary splits. This means that the edges 
adjacent to B" correspond to auxiliary splits s Ui -i t i. 

In order to find a solution for Ip, it thus suffices to take B" and replace all hairs with the corresponding 
weight on their starting vertex on B" . □ 

3 Algorithm for WSR2 with few distinct vertex weights 

Let k = \{ui(v) : v € V}\ denote the number of distinct vertex weights in an instance (V,u),S) for WSR2. 
In this section, we exhibit a dynamic programming algorithm for WSR2 that works in polynomial time 
when k is a constant. Moreover, standard backtracking can be used to actually construct a solution, if 
one exists. 

Suppose \V\ = n and the multiset of splits, <S, contains the splits Si < S2 • ■ • < Sn-i- Let w\ < 
W2 ■ ■ ■ < Wk denote the distinct vertex weights and mi, TO2, . . . , m^ denote their respective multiplicities, 
i.e. mi = \{v e V : u>(v) = Wi}\ for all i £ {1, 2, . . . , k}. 

Our dynamic programming algorithm computes the entries of a boolean table A. The table A has an 
entry A\p, Wl, Wr, V\, w 2 , • ■ • , v k \ for each integer p with 1 < p < n — 1, each two integers Wl,Wr € 5, 
and each v, € {0, 1, ... , mj}, where i £ {1, 2, ... , k}. The entry A[p, Wl,Wr, v\, V2, ■ ■ ■ , Ufe] is set to true 
iff there is an assignment of the splits si, S2, • • ■ , s p to the £ leftmost edges and the r rightmost edges of 
the path P n on n vertices, such that 

• p = £ + r; 



• v\ weights W\, v 2 weights 11)2, . . . , and Vk weights Wk are assigned to the £ leftmost and the r 
rightmost vertices of P n such that each split assigned to the left (respectively to the right) part of 
the path corresponds to the sum of the vertex weights assigned to vertices to the left (respectively 
to the right) of this split; and 

• Wl is equal to the value of the £ th split from the left and Wr is equal to the r th split from the 
right. 

Intuitively, our algorithm assigns splits and weights by starting from both endpoints of the path and 
trying to join these two sub-solutions. 

For the base case, set A[0, Wl, Wr, v\,V2,---, Vk] to true if Wl — Wr = v% = V2 = . • . = i>fc = and 
to false otherwise. We compute the remaining entries of A by increasing values of p using the following 
recurrence. 



k 



' A]p- l,W L - Wi,W R ,vi,v 2 ,. . .,t>i_i, 

A\p,W L) WR,v 1) V2,...,v k }= y ( 

.* a VA\p- 1,W l ,Wr- Wi,vi,v 2 ,...,Vi-i, 

{ Vi-l,V i+ i,Vi + 2,...,Vk] 

In the previous recurrence, the formulas that refer to table entries that are undefined have the value 
false. 

The final result of the algorithm is computed by evaluating the expression 

Y A[\S\, W l ,Wr, mi, m2,...,ra,-_i,mj - l,m i+1 ,m i+2 ,. . .,TOfc]. 
w L ,w R es 

j£{l,2,...,fc} 
(W L <Wi+W R ) A (W R <Wi+W L ) 

Theorem 6. WSR2 can be solved in time 0(n k+3 ■ k), where k is the number of distinct vertex weights 
of any input instance (V, UJ,S) and n is the number of vertices. 

Proof. The correctness of the base case is clear. For the correctness of the recurrence, let w p i V ot be the 
vertex on P n where the two sub-solutions corresponding to the left and to the right part of P n meet. 
First note that the values of the splits increase from left to right until we encounter vertex u p i V ot> from 
which point they decrease. Filling up the path from both ends, this means that reading the splits from 
S\ to s„_i, we can assign them to the path, each time only deciding whether we assign it to the left 
part or to the right part of the path (in the SCD model, this would be equivalent to deciding whether 
to meet the next deadline on the processor Pj or on the processor P 2 ). The first (respectively second) 
case of the recurrence corresponds to assigning the next split to the left (respectively right) part of the 
path by inserting a vertex of weight Wi, i € {l,2,...,fc}. The correctness of the final evaluation follows 
because it inserts the one missing vertex weight that has not been used between the left and the right 
part of the path. 

The table has |5| 3 • H^ =1 (mi + 1) < n k+3 entries, each entry can be computed in time O(k), and the 
final evaluation takes time 0{n ■ k). □ 

4 SR3 is NP-complete 

In this section we show that Splits Reconstruction with unit weights is NP-complete for trees with 
maximum degree 3. Our polynomial-time reduction is done from the strongly NP-complete NMTS 
problem recalled in Section [T] This problem remains NP-complete even if each integer of the NMTS 
instance is at most p(m), where p is a polynomial and m is the length of the description of the instance. 
Let us just mention that the next theorem does not immediately follow from Corollary [5] 

Theorem 7. SR3 is NP-complete. 

Proof. Let A — {5i, 0,2, ■ ■ ■ , a m }, B — {b\, 62, ... , b m } and S = {si, §2, ■ ■ ■ , s m } be an instance of NMTS. 
Let C = max{x : x E AU B}. Without loss of generality, we construct the following equivalent NMTS 




7+S1 +S2+S3+S4 ■ 



Figure 2: A tree with maximum degree 3 representing a solution to an SR3 instance constructed as 
described in the proof of Theorem [7] 



instance: 



a% := hi + 2 + 3C, 1 < i < m, 

bi := hi + 3 + 5(7, 1 < i < m, and 

Sj := Si + 5 + 8C, 1 < i < m. 

Let A = \Ji<i<mi a i}> B = Ui<K m { fc J. and S = Ui<i< m { s J- Clearly, the instance (A,B,5) has a 
solution if and only if the instance (A, B, S) has a solution. 

Now we describe an instance (V, S) of SR3, which is a YES-instance if and only if the previous 
instance (A, B, S) of NMTS is a YES-instance (see also Figure [2]). 

Let n = 2m — 2 + Yh^i ai + SI^i °i be the number of vertices in the set V; we recall that these 
vertices have unit weight. The multiset S of splits is defined as follows. 

• For each value Si, 1 < i < m, the value 1 + Sj is added to S and we refer to these splits as red 
splits. 

• For each value s,, 2 < i < m — 2, the value (i — 1) + ^*- =1 (l + Sj) is added to 5 and we refer to 
these splits as black splits. 

• For each value a*, 1 < i < m, the values {1,2,..., o^} are added to 5 and we refer to these splits 
as green splits. 

• For each value bi, 1 < i < m, the values {1,2,..., bi} are added to S and we refer to these splits as 
blue splits. 

Finally each value x of S is replaced by min(x, n — x). As required, S contains n — 1 splits. 

Lemma 8. (A,B,S) is a YES-instance for NMTS if and only if(V,uj:V—> {1},S) is a YES-instance 
for SR 3 . 

Proof. Throughout the proof, when we refer to a split of value x, we mean a split of value min(a;, n — x). 

"=>" Assume that (A,B,S) is a YES-instance for NMTS. We will show that there is a solution to 
SR3. A tree T = (V,E) and a bijective function b : E — > S can be constructed as follows (see also 
Figure pi). Construct a path P with m — 3 edges with the black splits such that the (i — l) th edge is 



associated to the black split (i — 1) + X)j=i(l + s j)>i •= {2,3, . 



. , 771 



2}. Add two edges incident to 



the first vertex of P, that are associated to the red splits 1 + s\ and 1 + S2- Add two edges incident to 
the last vertex of P that are associated to the red splits 1 + s m _i and 1 + s m . To the 7 th vertex of P, 
2 < i < m — 3, add one incident edge associated to the red split 1 + Sj+i. Finally, for each ai £ A and 
each bi € B, construct the paths with a, and bi vertices respectively. To each edge of these 2m paths, 
we can associate a green or a blue split. It remains to attach one green path and one blue path to each 
endpoint of an edge associated to a red split (one endpoint is already involved in the path P and of 
degree 3). The way to attach these path is given by the solution to the (A, B, S) instance. 
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"<^=" Assume that there exists a solution (T,b) to SR3, where T — (V,E) is a tree of maximum 
degree 3 and 6 is a bijection from E to S. We show how a solution of the NMTS instance (A, B, S) can 
be derived from (T, b). Let us note that for any i, j, k £ {1, 2, ... , m}, we have that a,; + Sj > Sfe, that 
a, + aj < Sfc, that 6, + bj > S&, and that Oj + Oj > &&. 

Claim 8. For every i £ {1, 2, . . . ,m}, there is a path on on edges, called the ai-path, using the splits 
1,2,..., a.i (without loss of generality, they are green) and there is a path on bi edges, called the bypath, 
using the splits 1, 2, . . . , 6, (without loss of generality, they are blue). All these a-paths and b-paths are 
-disjoint. 



C m , where 


= 1 + Si, is 


□ 



Proof. As the instance has 2m splits of value 1, T has 2m leaves. Each of these leaves is incident to a 
green or blue split of value 1. As the instance also has 2m splits of each of the values 2, 3, . . . , 2 + 3C, the 
leaves of T are the starting points of 2m edge-disjoint paths Pi, P2, ■ ■ . , Pim,-, each having 2 + 3C edges 
in T. Consider aueiUB and the splits 2 + 3C + 1, 2 + 3C + 2, . . . , x. As x < 4 + 6C, and as there 
is no split smaller than 2 + 3C other than those we have already used to form the paths Pj, 1 < i < 2m, 
the splits 2 + 3C + 1,2 + 3C + 2, . . . ,x are assigned to an extension of a path Pi, which, together with 
Pi, forms a path P[ with x edges using the splits 1,2, ... ,x. All these paths P[, 1 < % < 2m, are edge 
disjoint and without loss of generality, green splits are assigned to their edges if they have at most 2 + AC 
edges and blue splits otherwise. □ 

Claim 9. For every i € {1,2,..., m}, the red split of value 1 + s.; is assigned to an edge e^ of T whose 
vertex u^ is the common extremity of an a-path and a b-path, where Ui is in the subtree of T — e^ that 
has Si + 1 vertices. 

Proof. As no split has value Si, vertex Ui is incident to another two edges besides e^. We note that all 
splits, besides those of the a- and 6-paths, have value at least 6 + 8C. One such split plus the smallest 
aj, 1 < j < m, would exceed s^. So, Ui is the end point of two a/&-paths. These cannot be two a-paths 
as aj + ak < Si, for any j, k £ {1, 2, . . . , m} and they cannot be two 6-paths as bj + bk > Si, for any 
j, k £ {1, 2, . . . , m}. Thus, Ui is the common extremity of an a-path and a 6-path. □ 

Finally, a solution to the instance (A, B, S) of NMTS is formed by the couples C\, Ci, . 
each Ci contains ai a and bi b , where i a and ib are such that the edge e^ of T, with b(e,i 
incident to the ai a -path and the 6i b -path. This proves the NP-hardness of SR3. 

As the certificate is a tree on n vertices, the membership in NP is obvious and Theorem [7] is proved. □ 

5 Algorithm for SR with few leaves 

In this section we design an algorithm for SR parameterized by the number k of splits that are equal 
to one, i.e. k = \{s = 1 : s £ S}\. As each such split is incident to a leaf in a reconstructed tree, the 
algorithm reconstructs trees with k leaves. 

The algorithm starts with a star T with center r and k leaves. The vertex r is also the root of T and 
r is the only vertex which is allowed to have non-unit weight during the execution of the algorithm. We 
start by setting w(r) = n — k. The splits that are equal to 1 are assigned to the edges of the star. 

At any stage of the algorithm, T is a tree with splits from S assigned to its edges, and the goal is to 
replace the root r of T by a tree T r with unit- weight vertices (except for the new root, that can have a 
non-unit weight), using splits from S that have not been assigned yet; the leaves of T r are made adjacent 
to the former neighbors of r in T. If there exists such a replacement where the splits form a subset of 
S, we say that T has a valid extension. Each tree T uniquely defines a partition (A, C, U) of the splits 
S such that 

• A represents the multiset of available splits that have not yet been assigned to T, 

• C represents the multiset of current splits assigned to edges incident to r, and 

• U represents the multiset of used splits assigned to edges of T that are not incident to r. 
Let b denote the value of the smallest split in C ' . Our tree T will grow out of r as follows. 

• If w(r) = 1, then return True. Indeed, T uses all splits from S as A is empty. 
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• If A contains a split whose value is at most 6, then T has no valid extension and the algorithm 
backtracks. Indeed, on a path between two leaves in a valid tree, there is no split with value at 
most b between two splits with value at least b. 

• li \{s E A : s = b + 1}\ > \{s E C : s = b}\, that is, A contains more splits with value b + 1 than 
C contains splits with value b, then T has no valid extension and the algorithm backtracks. The 
correctness of this case holds by the pigeonhole principle and the argument used in the previous 
case. 

• If \{s € A : s = b + 1}\ = \{s € C : s = b}\, then all valid extensions of T are also valid extensions 
of the tree obtained from T by subdividing each edge with split b that is incident to r. That is, for 
each edge rv with a split of value b, add a new vertex z v , remove the edge rv, and add edges rz v 
and z v v. Decrement uj(r) by \{s G C : s = b}\. The algorithm recursively solves the problem on 
this tree. 

• Otherwise (if \{s € A : s = b + 1}| < \{s e C : s = b}\), some split from C with value b receives a 
parent split with value more than 6+1. Go over all choices for selecting a subset U of N(r) of size 
at least 2 containing a vertex v such that rv is associated with a split with value b. If A contains 
no split that equals 1 + J2 u eu s ( ru )i where s(e) denotes the split associated to the edge e of T, 
then discard this choice. Otherwise, create a new vertex zjj, remove the edges {ru : u G U} from 
T, add the edges {zjju :u£[/U { r }}, ancl decrement w(r) by 1. The algorithm resursively solves 
the resulting subproblcms. If one such tree has a valid extension, T has a valid extension. 

Theorem 9. SR can be solved in time O(8 klosk ■ n), where k = |{s = 1 : s € S}\ and n is the number 
of vertices. 

Proof. The arguments for correctness have been given in the description of the algorithm. For the running 
time analysis, we observe that w(r) decreases in each recursive call, no recursive call increases |C|, and 
the time spent in each recursion step is linear. Let T(c) denote an upper bound on the number of atomic 
instances solved for an instance with \C\ — c < k, where an instance is atomic if the algorithm makes no 
recursive call for solving the instance. In the only case making more than one recursive call, we have 



i=2 



T(c)<J2[ C )T(c- t + l), 



as the set U in the neighborhood of N(r) is replaced by one vertex Zjj. As c < k and ( c ) < k l , we have 
that 

T{c) < (c- 1) • max {c l ■ T(c - (i - 1))} 

< max {k i+1 ■ T(k - (i - 1))} 

<max(^ +1 )A\ . 

This maximum is attained for i = 2, which proves the theorem. □ 



6 Freely choosable weights 

We remark that the following modification of WSR makes any set of splits realizable in some tree. 
Suppose the weight function uj is not given, but freely choosable, that is, we ask whether, given a 
multiset S of integers, there exists a tree T = (V, E) and a weight function lj : V — >• N, such that S is 
the multiset of splits of T. We call this problem ChWSR. 

Theorem 10. ChWSR always admits a solution. 

Proof. We show that the answer to ChWSR is always yes: Decompose S into k chains s{ < s 3 2 < . . . s 3 , -. , 
j = 1,...,k, where k is the maximal multiplicity in S. Let T be obtained from the star K\ K by 
subdividing ej, the j th edge of T, m(j) — 1 times (for j = 1, ...,«), and root T at the center r of 
Ki_ K . Map s\ to the edges of the subdivided 6j, 1 < i < m(j), keeping their order, so that the edge 
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corresponding to s{ is incident to a leaf of T. Finally, choose the weight w(r) for the root to be equal 
to the maximum value in S. For each leaf v of T, set the weight ui(v) equal to the split assigned to the 
edge {v,u}, where u is the parent of v. Any other vertex v is given a weight equal to the difference of 
splits assigned to the edges incident to v. This choice of T and lu clearly satisfies the requirements. □ 



Remark. Due to the construction provided by the proof of Theorem 10 we note that we are not only 
always able to construct a tree T as required, but the structure of this tree is also rather simple. In 
particular, the realization of the split sequence is a path if each split in S repeats at most twice. 

Observe that if we consider ChWSR with unit weights, we are back at the problem SR. It is not difficult 
to see that in SR, a given set of splits can be realized in the same way as explained in the proof of 
Theorem [10] for ChWSR, the only difference being that each time a non-unit weight w is assigned to 
some vertex v in ChWSR, in SR we have to add w — 1 leaves of unit weight to v. Thus, if S contains 
a sufficient number of splits 1, then S can always be realized by a tree. More precisely, setting the 
boundary values Sq :— for all j, and letting k denote the maximum multiplicity over all elements in S 
except 1, we have that if K > 2 and S contains at least 

k m(j) K 

K +EE(^- S '-l - !) + 2 ■ ^S S Ui)} - 1 - E S rn U ) 
3=1 i=\ - - 3 = 1 

times the split 1, then S can be realized by a tree T: k of them are needed to be assigned to edges 
incident to leaves of the star, Y^T=i \ s l ~ s l-i — -0 °f th em are added, with pending vertices, to vertices 
introduced by subdividing the edge ej, and 2 • max^ =1 {s^ n c,-- ) } — 1 — Ylj=i s m(i) °^ them are added, with 
pending vertices, to the root. (Note that it does not matter if there are more splits 1 than needed in our 
construction, since we may always add leaves to the center of T.) The previous condition is, of course, 
sufficient, but not necessary. Moreover, the tree T that realizes S is a subdivided star to which some 
leaves have been added. In particular, if each split in S repeats at most twice, then we can realize S in a 
caterpillar with hair-length one. We note that the conditions n = 2 and the lower bound on the number 
of splits with value 1 are also necessary for caterpillars with hair length one. 

7 Conclusion 

In Section [3j we have shown that WSR2 is in XP when parameterized by the number of distinct vertex 
weights. It remains open whether this problem is fixed parameter tractable (a generalization of the 
problem is W[l]-hard [?])■ For practical purposes, it would further be important to identify other quan- 
tities that are small in practice (e.g. the diameter of the tree or topological indices), and investigate the 
multivariate complexity of the considered problems parameterized by combinations of these quantities. 
There is a large contrast between the complexities of WSR, where we are given n vertex weights, 
and ChWSR, where we can freely choose the vertex weights, or, alternatively, we can choose the vertex 
weights from an infinite multiset containing n times each element of N. It would be interesting to know 
some restrictions on the multiset of vertex weights such that the problem becomes polynomial time 
solvable, or fixed-parameter tractable with respect to interesting parameterizations, when we can chose 
the weights from this multiset. Ideally, these restrictions should be consistent with the applications in 
drug design and discovery. 

Acknowledgment. We thank Ming- Yang Kao for communicating this problem. 
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